

# **Face Detection Accelerator ASIC**

Graduate School of Electronic and Electrical Engineering, Kyungpook National University



## Sunghun Jung, Kyeong-Kuk Min, Hayoon Song, and Byungin Moon Abstract

A face detection accelerator is used as a pre-processing system of the face recognition system and extracts only the face region from the entire image to increase the efficiency of the face recognition operation. In this study, the face detection accelerator was designed based on the adaptive boost (AdaBoost) training algorithm to detect only the face region from an input image acquired by a camera [1][2]. The designed face detection accelerator can detect faces of various sizes in the input image by adopting the image pyramid method and can improve performance by updating internal parameters through I<sup>2</sup>C-based communication. In addition, the designed face detection accelerator can be operated with low area and low power. In particular, when the face detection accelerator is developed as an application-specific integrated circuit (ASIC), its utility in various edge devices that require low-power operation increases. For this reason, an AdaBoost-based face detection accelerator was developed as an ASIC through the Samsung 28-nm fabrication process.

#### Hardware Architecture

As shown in Figure 1, the hardware architecture consists of nine modules: 1) input/output interface, 2) frame buffer, 3) scaler, 4) address generator, 5) line buffer, 6) integral image generator, 7) cascade classifier, 8) merger, and 9) I<sup>2</sup>C module.

In the Integral Image Generator module, the word length reduction method [1] was **applied to reduce memory usage** by 50% compared with the conventional integral image generation method. To improve the processing speed, the cascade classifier module is divided into two-step to reduce the operation of the first stage and reduce unnecessary operations by adopting the skip scheme in the cascade classifier module [2]. In addition, performance can be improved by updating parameters through the I<sup>2</sup>C module. **ASIC** Design

To develop an application-specific integrated circuit (ASIC), we designed the hardware architecture using Verilog HDL and verified the real-environment operation based on FPGA. Table 1 shows the design specifications of the face detection accelerator. The gate counter is the result of synthesis with the design complier, and the power is the result reported through ICC2 after the layout. The final chip layout is shown in Figure 2.

## **Test Environment for Chip Operation Verification**

Figure 3 shows the test environment for verifying the chip operation. The ASIC chip is



Figure 1. Hardware architecture of the face detection accelerator

| Spe              | ecifications               |  |
|------------------|----------------------------|--|
| Image resolution | 1920 ×1080                 |  |
| Frequency        | 148.5 MHz                  |  |
|                  | 64 SRAMs of 32512×16       |  |
|                  | 21 SRAMs of $640 \times 8$ |  |
| Momory           | 5 SRAMs of $432 \times 60$ |  |
| Memory           | 5 SRAMs of $432 \times 26$ |  |
|                  | 5 SRAMs of 432×18          |  |
|                  | 1 SRAMs of 432×15          |  |
| Gate count       | 1,582,252                  |  |



Figure 3. Test environment for verifying chip operation

mounted on a chip test board and interconnected with the SoC platform board through FMC interconnect board. The SoC platform board transmits the face detection parameter values to the ASIC through I<sup>2</sup>C-based communication. And the SoC platform board receives the test image from the camera interface board, sends it to the ASIC test board, and receives the computed image from the ASIC. Thereafter, the SoC platform board transmits the computed image to a personal computer (PC) through the camera interface board. Finally, verification is performed by confirming the ASIC chip's operation by checking the PC's computed image. Conclusion

In this study, we developed an ASIC face detection accelerator that adopts word length reduction, skip scheme, two-step classifier structure, and I<sup>2</sup>C-based communication module. When verifying the operation using an SoC platform board, It was confirmed that 69 of 81 ASIC chips can change internal parameters through I<sup>2</sup>C-based communication and can operate at 15 frames per second (fps) for FHD resolution.

However, as shown in Figure 4, the problem of the output signal intermittently falling to 0 was found in all ASIC chips, and it is presumed that there is a problem not only in the output signal but also in the internal signal. As a result, several vertical lines appeared in the output image as shown in Figure 5, and the face detection operation, which required a long operation, was not fully performed.

#### Reference



Figure 4. I/O signals observed

with a logic analyzer



Figure 5. Output image of ASIC

[1] J. Kim, J. Hyun, and B. Moon, "Low-cost Hardware Architecture for Integral Image Generation using Word Length Reduction," in Proc. Int. SoC Design Conf. (ISOCC), pp. 119-120, 2020.

[2] J. Hyun, J. Kim, C.-H. Choi, and B. Moon, "Hardware Architecture of a Haar Classifier Based Face Detection System Using a Skip Scheme," in Proc. International Symposium on Circuits and Systems (ISOCC), pp. 1-4, 2021.

#### Acknowledgement

The chip fabrication and EDA tool were supported by the IC Design Education Center(IDEC), Korea.



This research was supported by National R&D Program through the National Research Foundation of Korea (NRF) funded by Ministry of Science and ICT (2020M3H2A107804514).